An Approximate L1-Difference Algorithm for Massive Data Streams
نویسندگان
چکیده
We give a space-efficient, one-pass algorithm for approximating the L1 difference Pi jai bij between two functions, when the function values ai and bi are given as data streams, and their order is chosen by an adversary. Our main technical innovation is a method of constructing families fVjg of limited-independence random variables that are range-summable, by which we mean that Pc 1 j=0 Vj(s) is computable in time polylog(c), for all seeds s. These random-variable families may be of interest outside our current application domain, i.e., massive data streams generated by communication networks. Our L1-difference algorithm can be viewed as a “sketching” algorithm, in the sense of [Broder, Charikar, Frieze, and Mitzenmacher, STOC ’98, pp. 327-336], and our algorithm performs better than that of Broder et al. when used to approximate the symmetric difference of two sets with small symmetric difference.
منابع مشابه
An Approximate L-Difference Algorithm for Massive Data Streams
Massive data sets are increasingly important in a wide range of applications, including observational sciences, product marketing, and monitoring and operations of large systems. In network operations, raw data typically arrive in streams, and decisions must be made by algorithms that make one pass over each stream, throw much of the raw data away, and produce “synopses” or “sketches” for furth...
متن کاملAn Approximate Lp-Difference Algorithm for Massive Data Streams
Several recent papers have shown how to approximate the difference ∑i |ai−bi| or ∑ |ai−bi| between two functions, when the function values ai and bi are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream) and little time to process each item in the stream. They approximate with small rel...
متن کاملStreaming Algorithms for Distributed, Massive Data Sets
Massive data sets are increasingly important in a wide range of applications, including observational sciences, product marketing, and monitoring and operations of large systems. In network operations, raw data typically arrive in streams, and decisions must be made by algorithms that make one pass over each stream, throw much of the raw data away, and produce \synopses" or \sketches" for furth...
متن کاملMassive Data Streams Research: Where to Go
This phenomenon has challenged how we store, communicate and compute with data. Theories developed over past 50 years have relied on full capture, storage and communication of data. Instead, what we need for managing modern massive data streams are new methods built around working with less. The past 10 years have seen new theories emerge in computing (data stream algorithms), communication (co...
متن کاملChapter 9 MINING TEXT STREAMS
The large amount of text data which are continuously produced over time in a variety of large scale applications such as social networks results in massive streams of data. Typically massive text streams are created by very large scale interactions of individuals, or by structured creations of particular kinds of content by dedicated organizations. An example in the latter category would be the...
متن کامل